An evaluation of text classification methods for literary study
نویسنده
چکیده
This article presents an empirical evaluation of text classification methods in literary domain. This study compared the performance of two popular algorithms, naı̈ve Bayes and support vector machines (SVMs) in two literary text classification tasks: the eroticism classification of Dickinson’s poems and the sentimentalism classification of chapters in early American novels. The algorithms were also combined with three text pre-processing tools, namely stemming, stopword removal, and statistical feature selection, to study the impact of these tools on the classifiers’ performance in the literary setting. Existing studies outside the literary domain indicated that SVMs are generally better than naı̈ve Bayes classifiers. However, in this study SVMs were not all winners. Both algorithms achieved high accuracy in sentimental chapter classification, but the naı̈ve Bayes classifier outperformed the SVM classifier in erotic poem classification. Self-feature selection helped both algorithms improve their performance in both tasks. However, the two algorithms selected relevant features in different frequency ranges, and therefore captured different characteristics of the target classes. The evaluation results in this study also suggest that arbitrary featurereduction steps such as stemming and stopword removal should be taken very carefully. Some stopwords were highly discriminative features for Dickinson’s erotic poem classification. In sentimental chapter classification, stemming undermined subsequent feature selection by aggressively conflating and neutralizing discriminative features. .................................................................................................................................................................................
منابع مشابه
Generic Analysis of Literary Translation: A Case Study of Contemporary English Short Stories
Translation of a literary text is a difficult task, for understanding literature requires knowledge of various linguistic levels of a literary text in addition to strategies and methods of translation. To this should still be added cognitive-based translation training which helps practitioners preserve the aesthetic aspects of a literary text. Focusing on short story as a genre with both ...
متن کاملAn Analysis of Social Systems in the Translation of The Great Gatsby
This article was written based on the key concept of polysystem theory which, in translating any literary text, emphasizes the transference of the social systems in which a text is embedded. As stated by Tynjanov (1978a), polysystem theory saw translated literature as a system operating in the larger social systems of the target text. Thus, the task of understanding as well as transferring such...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملImportance and Position of Form and Writing Style in Philosophizing
Philosophical texts have emerged in diverse genres and literary forms. Thinkers take different stands about importance of elements of these works. Some of them consider ornamental and accidental role for literary forms and the others in contrast consider philosophical implication for these elements. Philosophers’ approach to the role of literary elements of philosophical text is influential on ...
متن کاملLiterary Analysis in the Shadow of the Critique of New-Historicism A Critical Review of Literary Analysis: The Basics
Abstract Hossein Payandeh translated Literary Analysis: The Basics in to Farsi on 1396, by that time the original book had been on the market for 8 months. This book includes theoretical discussions and practical evidence on theory, literary criticism, and their relation to literary analysis. Although the author has presented his analysis in three distinct chapters and in the form of three met...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- LLC
دوره 23 شماره
صفحات -
تاریخ انتشار 2008